lu decomposition
Determinant Estimation under Memory Constraints and Neural Scaling Laws
Ameli, Siavash, van der Heide, Chris, Hodgkinson, Liam, Roosta, Fred, Mahoney, Michael W.
Calculating or accurately estimating log-determinants of large positive semi-definite matrices is of fundamental importance in many machine learning tasks. While its cubic computational complexity can already be prohibitive, in modern applications, even storing the matrices themselves can pose a memory bottleneck. To address this, we derive a novel hierarchical algorithm based on block-wise computation of the LDL decomposition for large-scale log-determinant calculation in memory-constrained settings. In extreme cases where matrices are highly ill-conditioned, accurately computing the full matrix itself may be infeasible. This is particularly relevant when considering kernel matrices at scale, including the empirical Neural Tangent Kernel (NTK) of neural networks trained on large datasets. Under the assumption of neural scaling laws in the test error, we show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset; in our experiments, this results in a $\sim$100,000$\times$ speedup with improved accuracy over competing approximations. Using these techniques, we successfully estimate log-determinants for dense matrices of extreme sizes, which were previously deemed intractable and inaccessible due to their enormous scale and computational demands.
Comparative Analysis of Linear Regression, Gaussian Elimination, and LU Decomposition for CT Real Estate Purchase Decisions
This paper presents a comprehensive evaluation of three distinct computational algorithms applied to the decision-making process of real estate purchases. Specifically, we analyze the efficacy of Linear Regression from Scikit-learn library, Gaussian Elimination with partial pivoting, and LU Decomposition in predicting the advisability of buying a house in the State of Connecticut based on a set of financial and market-related parameters. The algorithms' performances were compared using a dataset encompassing town-specific details, yearly data, interest rates, and median sale ratios. Our results demonstrate significant differences in predictive accuracy, with Linear Regression and LU Decomposition providing the most reliable recommendations and Gaussian Elimination showing limitations in stability and performance. The study's findings emphasize the importance of algorithm selection in predictive analytic and offer insights into the practical applications of computational methods in real estate investment strategies. By evaluating model efficacy through metrics such as R-squared scores and Mean Squared Error, we provide a nuanced understanding of each method's strengths and weaknesses, contributing valuable knowledge to the fields of real estate analysis and predictive modeling.
Alpha Elimination: Using Deep Reinforcement Learning to Reduce Fill-In during Sparse Matrix Decomposition
A large number of computational and scientific methods commonly require decomposing a sparse matrix into triangular factors as LU decomposition. A common problem faced during this decomposition is that even though the given matrix may be very sparse, the decomposition may lead to a denser triangular factors due to fill-in. A significant fill-in may lead to prohibitively larger computational costs and memory requirement during decomposition as well as during the solve phase. To this end, several heuristic sparse matrix reordering methods have been proposed to reduce fill-in before the decomposition. However, finding an optimal reordering algorithm that leads to minimal fill-in during such decomposition is known to be a NP-hard problem. A reinforcement learning based approach is proposed for this problem. The sparse matrix reordering problem is formulated as a single player game. More specifically, Monte-Carlo tree search in combination with neural network is used as a decision making algorithm to search for the best move in our game. The proposed method, Alpha Elimination is found to produce significantly lesser non-zeros in the LU decomposition as compared to existing state-of-the-art heuristic algorithms with little to no increase in overall running time of the algorithm.
LU decomposition and Toeplitz decomposition of a neural network
Liu, Yucong, Jiao, Simiao, Lim, Lek-Heng
It is well-known that any matrix $A$ has an LU decomposition. Less well-known is the fact that it has a 'Toeplitz decomposition' $A = T_1 T_2 \cdots T_r$ where $T_i$'s are Toeplitz matrices. We will prove that any continuous function $f : \mathbb{R}^n \to \mathbb{R}^m$ has an approximation to arbitrary accuracy by a neural network that takes the form $L_1 \sigma_1 U_1 \sigma_2 L_2 \sigma_3 U_2 \cdots L_r \sigma_{2r-1} U_r$, i.e., where the weight matrices alternate between lower and upper triangular matrices, $\sigma_i(x) := \sigma(x - b_i)$ for some bias vector $b_i$, and the activation $\sigma$ may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., $f \approx T_1 \sigma_1 T_2 \sigma_2 \cdots \sigma_{r-1} T_r$ to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when $f$ is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices sharply reduces the number of training parameters with almost no noticeable effect on test accuracy.
A Gentle Introduction to Matrix Factorization for Machine Learning - Machine Learning Mastery
The LU decomposition is found using an iterative numerical process and can fail for those matrices that cannot be decomposed or decomposed easily. A variation of this decomposition that is numerically more stable to solve in practice is called the LUP decomposition, or the LU decomposition with partial pivoting. The rows of the parent matrix are re-ordered to simplify the decomposition process and the additional P matrix specifies a way to permute the result or return the result to the original order. There are also other variations of the LU. The LU decomposition is often used to simplify the solving of systems of linear equations, such as finding the coefficients in a linear regression, as well as in calculating the determinant and inverse of a matrix.
Generalized Non-orthogonal Joint Diagonalization with LU Decomposition and Successive Rotations
Gong, Xiao-Feng, Wang, Xiu-Lin, Lin, Qiu-Hua
Non-orthogonal joint diagonalization (NJD) free of prewhitening has been widely studied in the context of blind source separation (BSS) and array signal processing, etc. However, NJD is used to retrieve the jointly diagonalizable structure for a single set of target matrices which are mostly formulized with a single dataset, and thus is insufficient to handle multiple datasets with inter-set dependences, a scenario often encountered in joint BSS (J-BSS) applications. As such, we present a generalized NJD (GNJD) algorithm to simultaneously perform asymmetric NJD upon multiple sets of target matrices with mutually linked loading matrices, by using LU decomposition and successive rotations, to enable J-BSS over multiple datasets with indication/exploitation of their mutual dependences. Experiments with synthetic and real-world datasets are provided to illustrate the performance of the proposed algorithm.